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Abstract 

This paper presents the Nataf-Beta Random Field Classifier , a dis¬ 
criminative approach that extends the applicability of the Beta con¬ 
jugate prior to classification problems. The approach’s key feature is 
to model the probability of a class conditional on attribute values as 
a random field whose marginals are Beta distributed, and where the 
parameters of marginals are themselves described by random fields. 
Although the classification accuracy of the approach proposed does 
not statistically outperform the best accuracies reported in the litera¬ 
ture, it ranks among the top tier for the six benchmark datasets tested. 

The Nataf-Beta Random Field Classifier is suited as a general purpose 
classification approach for real-continuous and real-integer attribute 
value problems. 

Keywords: Classification, Beta distribution, Nataf distribution, Random field, 
Conjugate prior, Gaussian process 


1 Introduction 

A large number of classification algorithms have been developed and are 
already achieving excellent performances in real-life classification contexts 
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|8j. The goal of this paper is to present a new method that extends the ap¬ 
plicability of the Beta conjugate prior to classification problems. The main 
incentives for such a method are (1) to have a probabilistic framework that 
is genuinely compatible with classification problems, and (2) to allow for the 
same intuitive interpretation as the Beta conjugate prior where the poste¬ 
rior probability density function (pdf) describing the probability of a class 
depends on the number of the number of positive and negative observations. 

This paper presents a new discriminative classification approach; its key 
feature is to model the probability of a class conditional on attribute val¬ 
ues as a random field whose marginals are Beta distributed, and where the 
parameters of marginals are themselves described by random fields. Sec¬ 
tion [2] presents the mathematical formulation for the Nataf-Beta Random 
Field Classifier ; Section [3] validates the approach using both simulated and 
benchmark datasets; Section [4] compares the approach proposed with Gaus¬ 
sian Process classification, a methodology that also relies on random fields; 
Section [5] discusses the limitations of the current approach and provides 
guidance for future extensions. 


2 Methodology 


The notation used in this paper is the following: Lower-case letters, e.g. 
“x” denote standard variables and indexes; Upper-case letters, “X”, denote 
random variables. A hat symbol denotes an estimation, e.g. Bold 

characters, i.e. “x or X” represent matrices and vectors, and calligraphic 
letters “X” represent sets. Lower-case Courier fonts represent length of 
sets and vectors, e.g. x = [x\,X 2 ,--- , x x ]. f(x) = Pr(X = x) denotes a 
probability density function (pdf). F(x) = Pr(X~ < x) denotes a cumulative 
density function (cdf). Superscripts f(x) and f"(x) respectively denotes 
prior and posterior pdf. The tilde symbol, i.e., f(x) denotes the predictive 
estimate of f(x). 

Given c E {0,1} a binary indicator variable referring to either class “0” 
or “1”, c(x) describes a class label as a function of a x-dimensionnal vector 
containing continuous attributes values, x = [xi,X 2 , • • • , x x ] T , where for all 
i, Xi E M. The knowledge of the possible values of c(x) is represented by a 
Bernouilli random variable C(x) ~ Ber(p(x)), where p(x) = Pr(C(x) = 1). 
Given a set of statistically independent observations V = {cj(xj)}4 =1 all 
sharing a common vector of attribute values Xi = x*,Vi, the posterior pdf 
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of p(x) can be defined using the Beta conjugate prior [8j, so that 


f" (p(x*)\V) = Beta(p(x*); a(x*), 6(x*)) 


p( x *)a( x *) X (1 — p(x*)) b ( x *^ 1 

B(a(x*), 6(x*)) 


( 1 ) 

a(x*) and 6(x*) respectively corresponds to the number of observations a(x*) 
and 6(x*), where c(x*) = 1 or 0 so that 


a ( x *) = o(x*) = #{i : (ci(x*) = l}f =1 } 

6(x*) = 6(x*) 4 #{* : (ci(x*) = 0}f =1 } 

Figure [l] presents examples of Beta pdf s for several sets of observations. If 



Figure 1: Examples of Beta pdf s describing the posterior knowledge of p(x), 
given sets of observations. 


an infinite number of observations are available for all x, the approach above 
will provide accurate estimates of the true probability p true (x) so that 

lim /"(p(x) \V) = 5(p true (x.)) 

d—>oo 

where 5(-) denotes the Dirac delta function. For problems of practical inter¬ 
ests, this limit is never reached, so it is necessary to deal with observations 
representing only a sparse subset of the possible attribute values. This pa¬ 
per presents a probabilistic methodology for handling such a situation. Our 
attention is limited to problems where p(x) is an unknown x-dimensional 
continuous function. Figure [2] presents an unidimensional example of such 
a function where each of the d dots corresponds to a class observation, each 
associated with a different attribute value x. 

For each vector of attribute values x 8 , our knowledge of p tme (xj) con¬ 
ditional on an observation, Cj(xj), is described by a Beta distribution as 
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o Observations - p true (ai) 



Figure 2: Unidimensional example of p true (x) where dots corresponds to d 
class observations, each corresponding to a different attribute value x. 


presented in Eq. 0. In order to propagate the knowledge f" (p(x.,)|cj(xj)) 
to f" (p(xj + Ax)|cj(xj)), it is necessary to model their joint conditional pdf 
as a random field. S = {xj}® =1 denotes the set of query attribute values 
where one is interested in predicting the joint probability p(<S). Note that 
the set of query attribute values S contains the set of observed attribute 
values in T>, i.e. {x : x E T>} C S. For all x E S, a joint conditional pdf is 
formulated using Beta-distributed marginal pdf s as presented in Eq. 0, an d 
a Gaussian copula function as presented by Der Kiureghian and Pei-Ling [lj. 
This combination leads to the Nataf-Beta probability distribution given by 


f(p(S)\V) = f 


D p (x) | V 

xGo 


= NBeta(p(S);a(S),b(S),R p 


(3) 


5 (z(p(S));R p ) f] 


x£iS 


Beta(p(x); a(x), 6(x)) 
0(z(p(x))) 


where Z ~ 4>(z) is a standard normal random variable for which the s- 
dimensional joint pdf is defined by <^> s (z(p(5)); R p ). The transformation 
between the standard normal space and the attribute space is given by 


*(p(x)) = $ ME^x))] 


where, < h _1 [-] is the inverse standard normal cumulative distribution func¬ 
tion ( cdf ). R p is the correlation matrix defined in the standard normal 
space. The spatial correlation between F > (xj) and P(xj) is governed by the 
Mahalanobis distance between x, and Xj. Accordingly, a Gaussian radial 
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basis function kernel [8] describes [R p ]ij so that 

[R P ]*j = exp - x i )diag(Zp) _1 (x i - Xj ) T ^ (4) 

l p = [l p< i, l Pt 2 , • • • , Zp, x ] T is a vector containing the length scale parameter for 
each dimension of our attributes space. 

Following the hypothesis that p true (x) is a continuous function, our 
knowledge of it must be continuous as well. Here, our knowledge of p tIue (S) 
is fully described by a(<S) and b(<S), where each can take any positive real 
value and where it is assumed that a(5)_LLb(5). The prior knowledge of 
a true (5) and b true (5) is described by random fields, in this case, lognormal 
processes so that 

A'(S) ~/>($)) = lnAA(a(S);A' a ,£' a ) 

B'(S) ~ /'(b(5)) = ln^(b(5);A',S') l j 

where 

S'a = diag(Ca) Ra diag(C) 

Y,' b = diag(Cfe) R b diag(Cb) 

A' = [A^, A 2 , • • • , A' S ] T and C = [Ci ? ''' j Cs] T are respectively the vectors 

of means and standard deviations of a'(x) and Z/(x) taken in the log-space. 
Following Eq.Q 

[R a\ij = exp (—|(xj — Xj)diag(Z a ) _1 (xj — Xj) T ) 

[R b\ij = exp (-g(xj - x i )diag(Z 6 ) _1 (xj - Xj) T ) 

Again, l a and If, are x-dimensional vectors containing length scales corre¬ 
sponding to each dimension of the attribute space. Note that the lognormal 
processes in Eq.([5]) is only a transformation of a Gaussian Process for which 
an analytic formulation is already available [12]. 

Following Eq.Q, a(x) and b(x) denote the direct count of the number 
of observations in V of a given class for a specific set of attribute values so 
that 

d( x ) = #{* : {c;(x) = l}f =1 } 

K x ) - #{* : (c*(x) = 0}f =1 } 

a(<S) and b(<S) denote vectors containing the direct counts of the number of 
observations of a given class for each x E S so that 


a(cS) = [a(<Si),a(<S 2 ),-• • ,a(S s )] T 

b(5) = [6(5 i),S(5 2 ),--- ,S(5 b )]t 
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By knowing l a and h, it is possible to propagate the knowledge a(xj), 6(xj), 
gained at a specific location x, from a single observation Cj(xj) G D, to any 
other subset of attribute values, a(x),6(x),Vx G 5. Given our assumption 
that the prior pdf of a(<S) and b(5) is respectively described by a log-normal 
process, their posterior pdf is obtained using Gaussian posterior conditional 
[8] so that 


/" ( x na(x)|a(x;) ) = lnjV(a(5); A", £") 


/" ( x n 5 6(x)|6(x,)) = lnJV(b(5);A",S"; 


( 8 ) 


where parameters A", X" are obtained so that 


K = 

(9) 

K = 

K = K-IKU- 

K = h 

(10) 


For the limit case where a length scale l —> 0, no knowledge is transferred to 
attribute values other than the one directly measured; if l —> oo, all attribute 
values share the same knowledge, i.e. a(xi ) = a{xj), Vi, j, no matter how far 
they are from the attribute value observed. 

The respective posterior joint distribution of a(5) and b(5) conditional 
on the set of observations T>, is obtained by summing the posterior obtained 
in Eq.(|ll[) for each observation so that 


A" (5) ~ /" 


x n 5 a(x)|P 


B"(S) ~ f" 




/ "CQ/ (X)I X& 4(X) ) 

( n a(x)|a( Xi )) 

»=l ' 

^"Cg/wh&“( x >) 

J2 f" f n a( x )|6( Xi )) 


( 11 ) 


The joint posterior pdf describing the probability of belonging to a particular 
class conditioned on observations is given by 


f"(p(S)\V) = NBeta(p(S); A"(5), B"(5), R p ) 
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In the above procedure, it is assumed that the length scale l a , l b , and 
l p are known constants. In practical cases, it is necessary to learn what are 
their possible values from observations. Bayesian inference is employed to 
learn the posterior pdf for length scale parameters, l a , h, and l p following 

f , f'V>\la,h,lp)nia,h,lp) 

J (la,lb,lp\jJ) = - JJjyj - 

where, f (l a ,h,lp) is the joint pdf describing prior knowledge, and f{T>) is 
the normalization constant. In order to derive an analytical formulation for 
the likelihood function f (T>\l a ,lb, l p ), it is necessary to notice that l a , lb are 
hyper-parameters , i.e., these are parameters of the prior knowledge. The 
joint posterior pdf describing the probability of belonging to a particular 
class conditioned on observations is given by 

f\p(S)\V) = NBeta(p(S); A'(S), B'(S), R p ) 

where A'(S) and IT(5) represents only the knowledge that has been prop¬ 
agated from indirect observations so that 

A'(S) = A"(«S)-a(«S) 

B'(S) = B"(S)-b(S) 

In the limit case where the length scales l a = h —> 0, 

a' (x) = b'fx.) —> 0,Vx G S 
and in the other limit case where l a = lb —>• oo 

a'(x) ^ E(a(5))-a(x) 

6'(x) ^ E(b (S))-6(x) 

How much particular values of l a , lb and l p explain the set of observations 
T> is quantified through the likelihood function 

f(V\l a ,l b ,l p ) = flj J |p(x ? ;) a '^)(l-p(x,;)) 6 ' (Xl) 

•NBeta(p(xj); a'(xj), 6'(x t ), R p ) 

•/(a'(xj)) • f(b\xi))da\Ki)db\xi)dp(xi) 

The likelihood function in Eq.(|12[) has no closed-form analytic solution. An 
accurate approximation of the likelihood can be obtained using Monte-Carlo 
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sampling techniques such as those presented by MacKay [6j. A computa¬ 
tionally less demanding, but more crude approximation consists in using 
expected values E[A'(5)] and E[B'(5)] instead of the full probability densi¬ 
ties. It reduces the likelihood function to 

/'(Oil.. h,i r ) = n / p(x,) E i A 'Mi (1 - Kxijyp'wi 

■NBeta(p(xj);E[A / (5)],E[B / (5)], R p )9p(xj) 

Unfortunately, this simplification is insufficient to lead to an analytically 
tractable solution; An analytically tractable solution is reached by making 
the simplifying assumption that the probabilities of classes for different sets 
of attribute values are independent so that P(xj)_LLP(xj), Vi / j. In that 
case, the parameters l p = 0 and the likelihood reduces to 

M - nU,) (13) 

B(E[A'(S)] + o(xj),E[B , (<S)] + 6(x,)) 
B(E[A / («S)],E[B / (<S)]) 

where B(-, •) is the Beta function. 

The posterior predictive pdf f" (p(S)\V) if given by 


f"(p(S)\V) = f p(S)f"(p(S)\V)dp(S) (14) 


Since no analytic formulation exist for the pdf in Eq. (14), it has to be 
evaluated numerically. By following the same simplifying assumptions as 
for Eq.(13), an approximation of the posterior predictive pdf is given by 

E[A"(«S)] 


/"(P(S)|2>) = 


E[A"(<S)] + E[B"(<S)] 


(15) 


Figure [3] presents the graphical model [8)110] for (a) the Beta conjugate 
prior, (b) the complete and (c) the simplified formulation for the Nataf- 
Beta Random Field Classifier. In the graphical models, circles represent 
random variables, arrows correspond to causal relations and links to bi¬ 
directional non-causal relations. Single-line nodes are discrete random vari¬ 
ables; double-line nodes are continuous random variables. Note that for the 
graphical models in (b) and (c) there are one C(x) per column of nodes; for 
(a), there are d C)(x). Also, in the special case where all observations are 
obtained for a same attribute value x, = x*, the Nataf-Beta Random Field 
Classifier collapses to the Beta-binomial conjugate prior. In such a special 
case, the graphical models in Figure [3] (a), (b) and (c) are all equivalents. 
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Mi S 1 : d 



(a) Beta con- (b) Complete formulation of the Nataf- 
jugate prior, Beta Random Field Classifier as pre- 
Eq.|l} sented in Eq.|l2|> 


(c) Simplified formulation of the Nataf- 
Beta Random Field Classifier as pre¬ 
sented in Eq.|T3|l 


Figure 3: Graphical models describing (a) the Beta conjugate prior, (b) the 
complete and (c) simplified formulation for the Nataf-Beta Random Field 
Classifier. 


3 Empirical validation 


This section validates the performance of the Nataf-Beta Random Field 
Classifier using simulated and benchmark datasets. Both, rely on the same 
prior knowledge and use the same search algorithm to identify length scales. 


All results were obtained using the simplifying hypotheses taken in Eq.(13), 


so that the length scale l p = 0. Also, the number of parameters is reduced 
by assuming that l a = h- 

The prior mean and variance of A/(x) and B'(x) both tends to 0. For 
practical purposes E[A'(x)] = E[B'(x)] = 10 -10 and var[A'(x)] = var[R'(x)] = 
10~ 20 . For all following applications, the prior pdf s for l a and lb is assumed 
to be uniform for l G M + . 

The maximum a posteriori (MAP) values for l a and lb are sought using 
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a Newton-Raphson gradient ascent method [8]. For all examples, only the 
MAP estimates are used. For every analyses, the start point for each length 
scale corresponds to the mean of the attribute values in the training set. The 
stopping criteria are either (1) if the difference in the mean log-likelihood 
(E[ln£]) value over the 10 and 5 previous steps is less than 10~ 3 x E[ln£], 
(2) More than 100 iterations have been made, or (3) more than 2 hours is 
spent without reaching convergence. 

Two performance metrics are computed for characterizing classification 
accuracy. In each case, the accuracy is quantified using a 10-fold cross- 
validation procedure, where the average of results obtained over each of the 
10 test-sets are reported. The first metric is the correct classification rate 
(CCR ) defined by 


CCR = 


TP + TN 

TP + TN + FP + FN 


where TP, TN, FP, FN respectively stands for true positive, true negative, 
false positive and false negative. In binary classification cases, an observa¬ 
tion is deemed to belong to class c (i.e. either a TP or TN instance) if 

the posterior predictive / // (Pr(C'(x) = c)\T>) > 0.5. In the case where the 
classification problem has more than 2 classes, an observation is deemed 
to belong to class c if the posterior predictive / // (Pr(C'(x) = c)\T>) is the 
greatest among all other classes. 

The second metric is the probability of correct classification ( PCC ) 


PCC = / // (Pr(C'j(xj) = c*)\V) 

i— 1 

where c* denotes the most probable class so that 

c* = arg max /"(Pr(Ci(x) = cf)\V) 

Ci 

The probability of correct classification allows estimating the classification 
accuracy without using the data in the test set. If PCC — CCR —> 0, it 
means the classification accuracy was predictable before observing any test 
data. 


3.1 Simulated data 

The first example consists in a binary classification problem where simulated 
data is generated using samples from the pdf described by Eq.Q, for l a = 
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h = Ip = 2. The probability of observing an attribute value is uniformly 
distributed over the range (0,10). Figure [ 2 ] presents the simulated p true (x), 
which corresponds to one realization of f(p(S)\T>). 

Table [T] presents the classification accuracy obtained using 10, 100 and 
500 simulated observations. Given that this is a simulated example, the true 
CCR and PCC accuracies are available. These results indicate that as the 
number of observations increases, the classification accuracy approach the 
true value. Figure [4] compares the performance of the Nataf-Beta Random 

Table 1: Comparison of the classification accuracy (CCR) and the probability 
of correct classification (PCC) with their true values. 


# 

observations 

Accuracy 

CCR true 

- truth [%] 

pp(J true 

Accuracy - CCR [%} 

E [CCR] std[CCR] 

Accuracy 
E [PCC] 

- PCC [%] 
stdfPCC] 

10 

70.2 

67.9 

60.0 51.6 

46.5 

11.8 

100 

81.0 

73.1 

81.0 12.9 

71.3 

7.3 

500 

79.0 

71.2 

78.4 5.6 

70.9 

3.1 


Field Classifier using an increasing number of observations, (a) 10, (b) 100 
and (c) 500. Each plot shows the contours of the posterior pdf, f" (p(5)|D), 
the posterior predictive pdf, f" (p(S)\V) and the true values p true (x). Again, 
as the number of observation increases, the contours of the posterior pdf and 
the predictive posterior approach the true values p true (x). 

3.2 Benchmark datasets 

This section presents the accuracy of the Nataf-Beta Random Field Classifier 
for 6 real-continuous and real-integer attribute value benchmark datasets 
taken from UCI’s Machine Learning Repository [5]. Table [ 2 ] provides target 
classification accuracy ranges for each dataset as reported by the following 
papers: na ee lain]. Note that in the references cited, no one methodology 
outperforms all others for all datasets. For the cases where a dataset contains 
missing data, missing values are replaced by the mean attribute value across 
the dataset. 

The classification accuracies reached with the Nataf-Beta Random Field 
Classifier are presented in Table [3] Although the classification accuracy does 
not in any case statistically outperform the best accuracy reported in the 
literature, it consistently ranks among the top tier. For the Iris, Cancer 
and Ionosphere datasets, the CCR and PPC accuracies are almost equals. 
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O Observations - p true (x ) - /"(p(x s )|£>) 



o Observations - p true (x) - /"(p(xs)|X>) 



o Observations - p true (:r) - /"(p(xs)|2?) 



(c) 500 observations 

Figure 4: Comparison of the performance of the Nataf-Beta Random Field 
Classifier using (a) 10, (b) 100, and (c) 500 observations. Each plot shows 
the contours of the posterior pdf, f'(p(S)\T>), the posterior predictive pdf, 

f" (p(S)\T>) and the true values p true (x). Simulated observations are de¬ 
picted by circles. 
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Table 2: The accuracies reported are the ranges found in the following refer¬ 
ences: mmmm- Note that in the references cited, no one methodology 
outperforms all others for all datasets. 


Accuracy range [%] Dataset 
Dataset E [CCR\ Reference 


Iris 

71.6-97.3 

0 

Pima 

55.3-75.4 

m 

Breast Cancer 

89.7-97.0 

m 

Ionosphere 

41.0-93.7 

m 

Glass 

38.1-95.5 

0 

E.Coli 

55.7-85.4 

m 


This means that the predicted classification accuracy is itself accurate. For 
other datasets, the CCR and PPC values are consistent with each other, yet 
not as close. These results confirm that the proposed Nataf-Beta Random 
Field Classifier is suited as a general purpose classification approach for 
real-continuous and real-integer attribute value problems. 


Table 3: Nataf-Beta Random Field Classifier validation: Comparison of the 
classification accuracy (CCR) and the probability of correct classification 
(PCC) for UCI datasets. Results are averages obtained from the test sets 
of 10-fold cross-validation analyses. 


Dataset 

Accuracy 
E [CCR] 

- CCR [%] 
std[CCR] 

Accuracy 
E [PCC\ 

- PCC [%] 
stdfPC'C'] 

Iris 

96.0 

7.2 

93.8 

3.8 

Pima 

73.0 

6.0 

65.6 

4.4 

Breast Cancer 

96.0 

2.2 

94.7 

2.2 

Ionosphere 

88.0 

4.6 

87.1 

3.7 

Glass 

80.5 

6.5 

67.5 

8.3 

E.Coli 

85.5 

6.2 

78.2 

5.7 
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4 Comparison with Gaussian Process classification 


Gaussian Process classification (GPC) is similar to the approach proposed 
in this paper because it models the spatial dependencies in the knowledge of 
the probability of a class using a Gaussian process. m describe Gaussian 
process classification as a “natural generalization of the linear logistic regres¬ 
sion model ". The main idea is to define a Gaussian Process (GP) over the 
domain of attribute values and then map this GP to the probability space 
using a sigmoid-shaped transformation function such as the logit or probit. 
The parameters of that GP are identified using the calibration set V. 

GPC can be seen as a bottom-up approach; the start point is that we 
have the GP which provides a convenient analytic formulation for modeling 
spatial dependencies for real-valued outcomes. In order to be compatible 
with classification problems, these outcomes are transformed using a sig¬ 
moid function chosen to satisfy the requirement that the probability of a 
class, Pr(C(x) = 1), must be defined over the interval (0,1). In this case, 
the choice of the sigmoid-shaped transformation function and the parame¬ 
ter (0) defining the Gaussian Process (GP(y(5); 6")) do not have a direct 
interpretation in relation with the classification problem. The posterior pdf 
describing the probability of belonging to a given class is conceptually given 
by 

f"(p(S)\V) = sigmoid(GP(y(5); 6")) 

where y (S) is a set of real-valued outcomes obtained for each query point 
in S. 

Alternately, the Nataf-Beta Random Field Classifier can be seen as a 
top-down approach; the start point is that for a given a vector of attribute 
values x, the classification problem is genuinely described by the Beta con¬ 
jugate prior. The formulation presented in Eq.Q models the posterior pdf 
describing the probability of belonging to a given class as a random field 
(i.e. a Nataf-Beta joint pdf) for which marginal pdf s are Beta distributed. 
The posterior pdf describing the probability of belonging to a given class is 
conceptually given by 

f"(p(S)\V) = NBeta(p(S); A"(S), B"(S), R") 

Given that a(x) and 6(x) are positive real-valued number, and A" (S) and 
B^cS) are each modeled by random field, in this case, a log-normal process 
as described in Eq. ©• 

Both approaches are providing a posterior joint pdf describing the prob¬ 
ability of belonging to a class for a set of query attribute values, S. There- 
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fore, both approaches are able to distinguish between a lack of knowl¬ 
edge, Pr (classed) = Pr(class#2) = 0.5, due to a lack of observations, 
and an intrinsically ambiguous class probabilities, i.e. Pr true (class ^1) = 
Pr tme (class ff2) = 0.5. The main difference between the Nataf-Beta Ran¬ 
dom Field Classifier and Gaussian Process classification is thus in the in¬ 
terpretation. The Nataf-Beta Random Field Classifier has the same inter¬ 
pretation as the Beta-Bernouilli model and its formulation is directly issued 
from the binary classification problem; The Gaussian Process classification 
is a powerful proxy capable of fitting complex functions p true (x), however, 
its formulation is not rooted in classification problems. 


5 Discussion 

Results presented in §3.2| confirm that the Nataf-Beta Random Field Classi¬ 
fier is suited as a general purpose classification approach for real-continuous 
and real-integer attribute value problems. Note that this performance is 
achieved despite making the following simplifying assumptions: 

1. Only the MAP estimate for a(x), 6(x) are employed. 

2. Datasets containing more than two classes are analyzed as multiple 
2-classes problems. 

3. Missing data are replaced by the corresponding attribute mean value. 

4. The identification of the length scale l p is omitted. 

5. The number of parameters is reduced by assuming that l a = h 

All these simplifications can be relaxed at the expense of computational 
resources. Regarding the second simplification, a direct analysis of multi¬ 
classes problems is possible by using a Dirichlet pdf instead of the Beta pdf 
employed here. This extension is beyond the scope of this paper. 

It is important to note that as other classification methods, this one is not 
immune to the curse of dimensionality [8j. As the number of attributes (x) 
increases, the sparsity of a dataset over the attribute x-dimensional space 
increases exponentially. Guidance on how the accuracy of the proposed 
approach performs as a function of the sparsity of a dataset is beyond the 
scope of this paper. 
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6 Conclusion 


The Nataf-Beta Random Field Classifier is suited as a general purpose clas¬ 
sification approach for real-continuous and real-integer attribute value prob¬ 
lems. Although the classification accuracy does not statistically outperform 
the best accuracy reported in the literature, it consistently ranks among the 
top tier classification accuracies. The main strength of the approach resides 
in its formulation which extends the applicability of the Beta conjugate prior 
to classification problems. 
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